Long-read sequencing of extrachromosomal circular DNA and genome assembly of a Solanum lycopersicum breeding line revealed active LTR retrotransposons originating from S. Peruvianum L. introgressions

Transposable elements (TEs) are a major force in the evolution of plant genomes. Differences in the transposition activities and landscapes of TEs can vary substantially, even in closely related species. Interspecific hybridization, a widely employed technique in tomato breeding, results in the creation of novel combinations of TEs from distinct species. The implications of this process for TE transposition activity have not been studied in modern cultivars. In this study, we used nanopore sequencing of extrachromosomal circular DNA (eccDNA) and identified two highly active Ty1/Copia LTR retrotransposon families of tomato (Solanum lycopersicum), called Salsa and Ketchup. Elements of these families produce thousands of eccDNAs under controlled conditions and epigenetic stress. EccDNA sequence analysis revealed that the major parts of eccDNA produced by Ketchup and Salsa exhibited low similarity to the S. lycopersicum genomic sequence. To trace the origin of these TEs, whole-genome nanopore sequencing and de novo genome assembly were performed. We found that these TEs occurred in a tomato breeding line via interspecific introgression from S. peruvianum. Our findings collectively show that interspecific introgressions can contribute to both genetic and phenotypic diversity not only by introducing novel genetic variants, but also by importing active transposable elements from other species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10314-1.


Introduction
Transposable elements are ubiquitous components of plant genomes.LTR retrotransposons (LTR-RTEs) are among the highest copy members of the plant mobilome [1,2], accounting for 76% of the rye genome [3] and 50% of the tomato genome [4].Uncontrolled TE reactivation can lead to genetic instability.Therefore, plants have a complex system of epigenetic control of TEs, including RNA-dependent DNA Methylation (RdDM) [5,6] and DDM1-mediated DNA methylation [2].TEs in plants carrying mutations in the genes involved in epigenetic regulation can be transcriptionally reactivated.For example, a threefold increase in TE transcription was observed in a triple mutant of A. thaliana with mutations in three key TE-controlling genes, ddm1, rdr6, and polV [7].Similarly, transposition of some TE families has been observed in the DNA methylation-free A. thaliana mutant [8], which is likely due to the redistribution of histone modifications [9].
TE transposition can also occur in wild type plants under natural conditions.Such natural TE activity has contributed significantly to the evolution, adaptation, and domestication of plants [10].Bursts in LTR retrotransposon activity have had a key impact on the size of the genomes of some plant species [11][12][13], whereas individual insertions have created functionally altered or new alleles of genes [14,15].More specific examples include the emergence of traits such as seedless apples [16], grape skin color [17], and red orange pulp [18].In tomatoes, the insertion of elements of the Rider family resulted in an elongated fruit shape [19,20], yellow flesh fruit [21], and lack of formation of a detachment zone in the peduncle [22].A study on A. thaliana also provided an interesting example of the contribution of TE insertion to plant adaptation to a novel ecological environment.A rare variant of the A. thaliana FLC locus was found to contain an intronic insertion of a heat-inducible element in the ONSEN family, which may be an adaptation to flowering in the absence of vernalization [23].Whole genome sequencing of A. thaliana ecotypes revealed that hundreds of TEs generate novel insertions [23].Despite this, the ability to control the activation of specific TEs has only been possible for a small set of elements, such as the heat-induced ONSEN LTR retrotransposon of A. thaliana and the plant tissue culture-triggered Tos17 LTR retrotransposon of rice [24].
In addition to environmental factors, TE activity can be triggered by 'genomic shock' as proposed by McClintock [25].Genomic shock can result from chromosomal rearrangements and interspecific hybridization.Transcriptional reactivation of LTR-RTE has been detected during long-distance hybridization in various plant species, including rice [26], wheat [27], Arabidopsis [28], and wild potato species [29].In addition, although much less frequently, hybridization-induced LTR-RTE transposition in real time has been detected in rice [30], poplar [31], and potato [32].The consequences of interspecific hybridization on TE composition in the genome have been well described for the Solanum genus.Interspecific hybridization is one of the main sources of genetic diversity in tomato (Solanum lycopersicum L.) breeding and domestication.A wide range of wild species has been implicated in this process, including S. peruvianum [33,34], S. chilense [35], and S. habrochaites [36], S. penellii [37] and S. pimpinellifolium [38].A recent study demonstrated that the TE composition of modern tomato cultivars is less divergent than that of wild species [39].At the same time, the domesticated tomato S. lycopersicum var.cerasiforme shows minor losses in the number of mobile TE families [39], which could potentially be due to recurrent hybridization with its closest wild relative, S. pimpinellifolium [40].Whether TEs located in interspecific introgressions maintain their activity during breeding and generate new insertions in the recipient genome has not been well studied.
In this study, we aimed to decipher inducible mobilome activity originating from TEs located at interspecific introgressions in the tomato genome.We performed a whole-genome analysis of TE activity using long-read nanopore sequencing of extrachromosomal circular DNAs from a tomato line.We found thousands of eccD-NAs mapped on members of families that we called 'Ketchup' and 'Salsa.' The eccDNA sequence analysis revealed that the major parts of eccDNA produced by Ketchup and Salsa did not fully align with any tomato (SL3.0)reference TEs, but were similar to the TEs from the S. peruvianum genome.We performed wholegenome nanopore sequencing and assembly for our tomato line and revealed large interspecific introgressions carrying members of the Ketchup and Salsa TE families.Our results suggested that active TEs introduced by interspecific hybridization may serve as an additional source of genetic diversity during plant breeding.

Mobilome-seq of a tomato breeding line
To unravel TEs capable of completing their life cycle, we performed nanopore (ONT) sequencing of extrachromosomal circular DNA (eccDNAs) of a tomato plant (Fig. 1).To collect more active TEs, we grew the plants in a special medium containing a mixture of zebularine and α-amanitin (A&Z).These chemicals lead to DNA methylation reduction and inhibition of Polymerase II, a major player in the PolII-RDR6 TE RdDM silencing pathway [41].Plants grown in MS medium without A&Z were used as controls to statistically evaluate eccDNA peaks.
In total, we obtained 64,000 and 48,000 reads for the A&Z and Control plants, respectively.The eccDNA reads were mapped to the reference genome (SL3.0),followed by intersection with LTR-RTE coordinates (4206 annotated LTR-RTEs) and manual curation.We found 101 L-RTEs, for which > 10 ONT reads were accounted.Of these, 38 L-RTEs demonstrated significant overrepresentation of ONT reads from the A&Z sample compared with the control sample (Fisher's exact test with multiple correction p-value < 0.01) (Fig. 2A).Phylogenetic analysis of these LTR-RTEs based on their LTR sequences revealed that they belong to two families, that we named 'Salsa' (a tomato-based sauce that dates back to the earliest tomato cultivators, the Aztecs and Mayans [42]) and 'Ketchup' (Figure S1).According to the GyDB database classification, the Ketchup elements belong to the Tork clade and are almost identical to CopiaSL_35 (average 98% identity with 99% coverage), whereas the Salsa family belongs to the Bianca clade and has some similarity to CopiaSL_25 (average about 72% identity with 41% coverage).Among the elements of each family, we selected one RTE with the highest CPM value: RTE976 (Sly11:3,646,824.3,652,356)from Salsa and RTE511 (Sly01:68,869,427.68,874,507) from Ketchup family.We checked for the presence of open reading frames in the genomic sequences of selected RTEs in the SL3.0 genome assembly, and both elements were found to have one or two long ORFs (Figure S2A).
Thus, nanopore Mobilome-Seq revealed that under epigenetic stress conditions (A&Z treatment), tens of LTR-RTEs belonged to two distinct families of tomato lines producing eccDNAs.

RTEs producing eccDNAs originate from S. peruvianum
Surprisingly, however, a detailed analysis of LTR-RTE coverage by ONT eccDNA reads revealed numerous SNPs (> 50 for RTE511 and > 100 for RTE976), distinguishing the LTR-RTE sequences of our tomato line from the reference.In addition, LTRs of RTE976 were not covered by eccDNA reads, and LTRs of RTE511 possessed a > 20 bp deletion based on read mapping (Fig. 2B).These observations show that the eccDNA reads most probably originated from RTEs that were not present in the reference genome sequence (SL3.0).To verify this, we performed a BLAST search for the most similar LTR sequences in the genomic assemblies of wild relatives of S. lycopersicum, including S. penellii, S. lycopersicum var.cerasiforme, S. pimpinellifolium, S. lycopersicoides, S. peruvianum, S. chilense, S. habrochaites.We used consensus LTR sequences deduced from the eccDNA reads mapped to RTE976/RTE511 as queries for the BLAST search.This analysis revealed that the most similar sequences were found in the S. peruvianum (SP) genome with > 96% identity to the query LTR sequences (Fig. 3A).

Ketchup-1 and Salsa-1 produce full-length eccDNAs with one or two LTRs
Individual RTE eccDNAs may represent different structural variants covering only a small LTR part, as well as whole RTEs [43].To assess the structure of eccDNAs produced by Ketchup-1 and Salsa-1, we investigated the monomers of individual eccDNA reads possessing concatemers.We found that a significant portion of eccD-NAs of Ketchup-1 contained only one LTR, with some eccDNAs also possessing small deletions (Fig. 4A).In turn, Salsa-1 produced full-length eccDNA with one or more LTR (Fig. 4B) and a small proportion of eccD-NAs containing truncated sequences (Figure S3).We then performed inverted PCR using specific primers for amplification of the LTR junction regions of eccDNA (Fig. 4C).For this experiment, genomic DNA from Control and A&Z samples before and after eccDNA enrichment and RCA were used.Weak and strong PCR products were obtained for solo-LTR eccDNAs produced by Ketchup-1 and Salsa-1 in control and Z&A samples, respectively (Fig. 4C).However, we were unable to detect extrachromosomal linear DNA (eclDNA) for Salsa-1 and Ketchup-1 in either the control or Z&A samples (Figure S4).
Altogether, the results demonstrate that Ketchup-1 and Salsa-1 RTEs produce eccDNAs with one or two LTRs under control and A&Z conditions.

Ketchup-1 and Salsa-1 invaded S. lycopersicum genome via an interspecific introgression
To determine how SP RTEs occurred in the genome of our tomato line, we performed whole-genome nanopore sequencing.We obtained 4,417,781 reads with a total length of 57.7Gb corresponding to ∼ 64x genome coverage of the tomato genome (1 C = 900Mb [44]),.We performed SNP calling and found significant biases in SNP density along SL chromosomes.The results unambiguously demonstrated a significantly high density of SNPs along full-length chromosomes 6 (2 Mb-32 Mb) and 9 (2 Mb-64 Mb) (Fig. 5; Figure S5).These results indicated that the genome of the tomato line used in this study possessed large interspecific introgressions.
To gain further insight into the origin of Salsa and Ketchup in our tomato line, we performed wholegenome assembly using only WGS nanopore reads.The assembly was performed using NextDenovo [45].The draft assembly resulted in an N50 around 16.7 Mb (16.721.217bp) and total length of approximately 800 Mb (813,056,715 bp) that is ∼ 100 Mb smaller than the predicted genome size of cultivated tomato The latter suggested that some highly abundant genomic repeats (e.g.telomere and centromere satellite repeats) can be collapsed during the assembly procedure.The quality of the draft assembly was verified using the BUSCO software [46].The percentage of the BUSCOs benchmark genes was high (> 98%) (C:99.3%[S:98.8%,D:0.5%],F:0.5%,M:0.2%,n:425).A comparison of the assembled genome and reference SL3.0 revealed that SL chromosomes 2, 4, and 7 were almost completely covered by two assembled contigs, further suggesting the relatively high contiguity of the draft genome (Figure S5).In line with the SNP density distribution, the comparison also revealed a low alignment rate between our assembly and chromosomes SL6 and SL9, pointing to the genomic differences between SL3.0 and the genome of our breeding line.
We next asked whether the introgressed SP RTEs in the genome of our tomato line are present in the same location in the SP genome.For this, we compared the RTEs with 2Kb flanking sequences between our assembly and SP, as well as SL (as a control) genome assemblies.This analysis revealed that Ketchup-1-1, Salsa-1-1, and Salsa-1-2 are not present in the same location in the SP and SL genomes (Figure S7).These results may suggest that the identified insertions in our genome assembly occurred during the plant breeding process.However, we cannot rule out the possibility that these RTEs were just not present in the sequenced SP plant.Thus, using WGS ONT data, we showed that Salsa-1 and Ketchup-1 occurred in the genome of our tomato line via interspecific hybridization and chromosomal introgression.

Discussion
Interspecific hybridization has been extensively used to introduce desirable genes from wild species into cultivated tomato [40].It has been known for a long time that interspecific hybridization may trigger TE reactivation and transposition [25].The TE composition of the tomato genome has been described previously [39].Here, we explored whether interspecific introgression might bring novel active TEs from other species.Using nanopore sequencing of eccDNAs, we described realtime mobilome activity that occurred under epigenetic Our results highlight how active transposons can be introduced into a new genome to maintain their activity for several generations.Indeed, hybridization-induced TE mutagenesis can be a major factor in the evolution of sexually reproducing organisms [47], and it has even been exploited for crop improvement [48].Interestingly, the population of transpositionally active TEs in wild tomato species is significantly larger than that in cultivated S. lycopersicum [39].Therefore, interspecific chromosomal introgressions in modern tomato varieties may carry active TEs.
Interestingly, we did not observe any novel insertions of introgressed RTEs in the genome of our breeding line.
This can be partially explained by the transposition of the original elements in the first stages of hybridization, followed by their subsequent elimination during backcrossing and selection.In contrast to the introgressed SP RTEs, we identified novel insertions for SL members of the Salsa and Ketchup families.It is interesting to speculate that the presence of active Salsa RTEs from S. peruvianum complemented SL RTEs to transpose, as has been shown for BARE-2 and Tos17 GAG-defective elements [49,50].
Short read sequencing has been frequently used for eccDNA detection [51][52][53].Utilization of long-read WGS and eccDNA sequencing allowed us to accurately determine the structure and full-length sequence of the eccDNAs.This allowed us to identify the positions of the active elements that are absent in the reference genome.Although the formation of eccDNA originating from RTEs has been considered a by-product of their activities [54], a recent study suggested that eccDNA is one of the key steps in the life cycle of RTEs [55].The concatameric structure of the eccDNA ONT reads allows distinguishing naturally occurring truncated sequences from DNA breaks that occur during the sample preparation procedure.This feature of eccDNA ONT data can help shed light on the composition and origin of eccDNA in cells [43,56].The authors showed that the ONSEN and EVD elements produced almost equal amounts of full-length (> 5 Kb) and truncated (< 1000 bp) eccDNAs in the ddm1 background.Interestingly, growing Arabidopsis on A&Z medium resulted in a shift in eccDNA composition toward truncated eccDNAs [43].These results are in contrast with our results for the Salsa and Ketchup elements.Here, we showed that Salsa and Ketchup TEs mainly produced full-length eccDNAs with one or two LTRs in tomato plants grown on A&Z media.These results suggest that eccDNA formation under similar growth conditions (for example, A&Z) may differ for different species and TEs.
In addition to the production of eccDNA under the relaxation of epigenetic control, Salsa and Ketchup also exhibited activity in the control sample, although to a lesser extent.EccDNA production poses a serious threat to genomic integrity and stability.The generation of eccDNA may result in genomic rearrangement via spontaneous reintegration into the genome, as has been shown for various types of eccDNA in eukaryotes [57].In addition, it has been suggested that a high load of eccDNA may alter DNA repair pathways, leading to new genetic variations [56].Additionally, eccDNAs may serve as a template for transcription of protein coding or non-coding RNAs further expanding the repertoire of possible consequences for the plant [58].For inheritance to the next generation, eccDNA-mediated genetic changes need to be produced in the plant 'germ line' cells, such as meristematic cells of the shoot apical meristem (SAM), pollen, or egg cells.However, RTE transcription and transposition are limited in these cells through specific epigenetic mechanisms [59].Thus, it remains an open question whether Salsa and Ketchup are capable of generating novel genetically inherited insertions, and whether their eccDNAs contribute to genome instability.This question could be answered by genomic analysis of M1 plants, which will be the subject of our future research.

Conclusion
Using nanopore whole-genome and eccDNA sequencing, we identified two novel families of tomato TEs, Salsa and Ketchup, that produce eccDNAs under both control and epigenetic stress conditions.We showed that these TEs occur in a tomato breeding line via interspecific introgression from S. peruvianum.Collectively, our results demonstrate that interspecific introgression may contribute to genetic and phenotypic diversity not only by providing new genetic variants, but also by bringing new active TEs from other species.

Plant material and in vitro growth conditions
Seeds of tomato line '812/18' used in this study were kindly provided by Tereshonkova Tatyana Arkadyevna (All-Russian Research Institute of Vegetable Production, Moscow, Russia).Tomato plants were grown on ½ MS medium supplemented with 4 mg/ml α-amanitin and 8 mg/ml zebularine for two weeks under a long-day photoperiod (16/8).

eccDNA isolation and sequencing
For eccDNA isolation we used the techniques described by Lanciano et al. [51] and Wang et al. [60] with modifications.Briefly, to remove linear DNA, 1 µg of total DNA was treated with 1 µl (10 U/µl) of PlasmidSafe DNase supplemented with 2 µl of ATP (25 mM) and 5 µl of 10× PlasmidSafe buffer in a volume of 50 µl.The reaction was incubated for 72 h with additional reagents (0.1 µl enzyme, 0.2 µl ATP, 0.3 µl buffer) was added every 24 h, followed by incubation at 72 °C for 30 min.Precipitation of eccDNA was carried out by overnight incubation at -20 °C in the presence of 0.1 volume of 3 M sodium acetate (pH 5.2) and 2.5 × volume of absolute ethanol, followed by centrifugation at 12,000 × g for 30 min.The eccDNA pellet was washed with ice-cold 70% ethanol and dissolved in 10 µl of deionized nuclease-free water.For eccDNA amplification using random RCA, 2 µL phi29 polymerase (Thermo Scientific, EP0091), 2 µL 10× phi29 reaction buffer, 5 µL 10 mM dNTP, and 1 µL 500 µM exo-resistant oligo (NpNpNpNpNpSNpSN, where p is phosphodiester and pS is the phosphorothioate group) with the addition of nuclease-free water to a final volume of 20 µL.The reaction was preheated to 95 °C for 5 min, ramped to 30 °C at a 1% ramp rate on a thermocycler, and incubated for 36 h at 30 °C.The enzyme was inactivated by heating the mixture at 65*C for 10 min.For debranching, 500 ng of RCA amplicons were treated with T7 endonuclease 5 µL of 10× reaction buffer and 1 µL of T7 endonuclease I (New England Biolabs, M0302S) in a 50 µL reaction volume.After incubation at 37 °C for 15 min, the reaction was stopped immediately and purified by adding an equal volume of chloroform.The Debranched RCA product was precipitated by adding 1/10V 3 M sodium acetate (pH 5.2) and absolute ethanol (2.5 V), 1 min.The resulting amplicons were separated on 1.5% agarose gel at 80 V for 60 min.

Bioinformatic analysis of eccDNA sequencing and data visualization
Raw eccDNA nanopore reads were mapped to the SL3.0 genome using the minimap2 software [64] with the following parameters: -ax map-ont -t 100.The obtained SAM files were converted to BAM format, sorted, and indexed using SAMtools [65].To obtain the eccDNA peaks, the obtained sorted bam files were analyzed using the eccStructONT pipeline, as previously described [43].

Fig. 2
Fig. 2 Analysis of eccDNA production by LTR-RTEs assessed using ONT sequencing.(A) Normalized (reads per 100,000 ONT reads) count of eccDNA reads from Z&A samples for 4206 RTEs located on chromosomes of SL3.0 genome assembly; (B) Coverage of RTE976 (Salsa family) and RTE511 (Ketchup family) by eccDNA reads.Orange and blue colors indicate eccDNA reads from the Control and Z&A samples

Fig. 3
Fig. 3 Phylogenetic and structural analyses of Ketchup and Salsa RTEs.(A) Alignment of LTR sequences from eccDNA and the genome assemblies of diverse tomato species.(B) Structure and open reading frames of two RTEs (Ketchup-1 and Salsa-1) in the S. peruvianum genome

Fig. 4
Fig. 4 Analysis of the structure of Salsa-1 and Ketchup-1 eccDNA.Dot plot from alignment of eccDNA deduced monomer sequences against full-length reference Ketchup-1 (A) and Salsa-1 (B) sequences.(C) Primer positions (top) and gel electrophoresis (bottom) for inverted PCR with genomic DNA from Control and A&Z samples before and after eccDNA enrichment and RCA

Fig. 5
Fig. 5 Whole-genome nanopore sequencing of the analyzed tomato line.SNP density deduced from alignment of ONT WGS reads of the studied tomato line on SL3.0 genome assembly; circles and triangles indicate original TEs and their insertions; rings represent eccDNAs produced by Ketchup and Salsa of S. peruvianum